-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cherry-pick #21258 to 7.9: o365input: Restart after fatal error #21386
Conversation
Update the o365input to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error. This enables the input to be more resilient against transient errors. Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes. (cherry picked from commit 8716d98)
Pinging @elastic/siem (Team:SIEM) |
💔 Tests FailedExpand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
Steps errorsExpand to view the steps failures
Log outputExpand to view the last 100 lines of log output
|
CI failures unrelated |
elastic#21386) Update the o365input to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error. This enables the input to be more resilient against transient errors. Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes. (cherry picked from commit c723c1e)
Cherry-pick of PR #21258 to 7.9 branch. Original message:
What does this PR do?
Updates
o365input
to restart the input after a fatal error is encountered, for example an authentication token refresh error or a parsing error.This enables the input to be more resilient against errors.
Before this patch, the input would index an error document and terminate. Now it will index an error and restart after a fixed timeout of 5 minutes.
Why is it important?
Some users are reporting that the
o365
module stops ingesting events after some days. In all cases it's been observed that the input terminated at some point due to errors contacting the Azure authentication server to refresh a token.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature worksCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Testing the case of token refresh errors is difficult as they are refreshed once every ~12h. But the behavior can be tested by starting Filebeat without an internet connection.